Homograph ambiguity resolution in front-end design for portuguese TTS systems

نویسندگان

  • Daniela Braga
  • Luís Pinto Coelho
  • Fernando Gil Vianna Resende
چکیده

In this paper, a module for homograph disambiguation in Portuguese Text-to-Speech (TTS) is proposed. This module works with a part-of-speech (POS) parser, used to disambiguate homographs that belong to different parts-ofspeech, and a semantic analyzer, used to disambiguate homographs which belong to the same part-of-speech. The proposed algorithms are meant to solve a significant part of homograph ambiguity in European Portuguese (EP) (106 homograph pairs so far). This system is ready to be integrated in a Letter-to-Sound (LTS) converter. The algorithms were trained and tested with different corpora. The obtained experimental results gave rise to 97.8% of accuracy rate. This methodology is also valid for Brazilian Portuguese (BP), since 95 homographs pairs are exactly the same as in EP. A comparison with a probabilistic approach was also done and results were discussed.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Context-Sensitive Homograph Disambiguation in Thai Text-to-Speech Synthesis

Homograph ambiguity is an original issue in Text-to-Speech (TTS). To disambiguate homograph, several efficient approaches have been proposed such as part-of-speech (POS) n-gram, Bayesian classifier, decision tree, and Bayesian-hybrid approaches. These methods need words or/and POS tags surrounding the question homographs in disambiguation. Some languages such as Thai, Chinese, and Japanese have...

متن کامل

Hélia, Heloísa and Helena: new HTS systems in European Portuguese, Brazilian Portuguese and Galician

Hélia, Heloísa and Helena are the new Text-to-Speech (TTS) systems for European Portuguese, Brazilian Portuguese and Galician, respectively, using state of the art HTS (HMM-based speech synthesis) technology developed at Microsoft. The three TTS systems are presented together as they share most of their modules, namely the same back-end engine and most of the front-end rules. The main differenc...

متن کامل

The broad study of homograph disambiguity for Mandarin speech synthesis

How to increase the intelligibility and naturalness of synthetic speech have drawn much attentions in the recent Mandarin textto-speech(TTS) researches. They have always been treated as bottleneck due to their effects are explicit for human perception. However, as qualities of synthetic speech increase for syllables, words or phrase, there is also an increasing need to improve the various compo...

متن کامل

A stochastic approach to phoneme and accent estimation

We present a new stochastic approach to estimate accurately phonemes and accents for Japanese TTS (Text-to-Speech) systems. Front-end process of TTS system assigns phonemes and accents to an input plain text, which is critical for creating intelligible and natural speech. Rule-based approaches that build hierarchical structures are widely used for this purpose. However, considering scalability ...

متن کامل

An investigation of working memory influences on lexical ambiguity resolution.

The present study employed a combined semantic judgment and lexical decision priming paradigm to examine the impact of working memory on the inhibitory processes of lexical ambiguity resolution. The results indicated that overall, participants activated one meaning of a presented homograph while not priming the alternative meaning. As hypothesized, participants with high working-memory spans ex...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007